West Papua
ACADREASON: Exploring the Limits of Reasoning Models with Academic Research Problems
Gui, Xin, Zhu, King, Ren, JinCheng, Chen, Qianben, Wang, Zekun Moore, LI, Yizhi, Liu, Xinpeng, Li, Xiaowan, Ren, Wenli, Miao, Linyu, Qin, Tianrui, Shu, Ziqi, Zhu, He, Tang, Xiangru, Shi, Dingfeng, Liu, Jiaheng, Jiang, Yuchen Eleanor, Liu, Minghao, Zhang, Ge, Zhou, Wangchunshu
In recent years, the research focus of large language models (LLMs) and agents has shifted increasingly from demonstrating novel capabilities to complex reasoning and tackling challenging tasks. However, existing evaluations focus mainly on math/code contests or general tasks, while existing multi-domain academic benchmarks lack sufficient reasoning depth, leaving the field without a rigorous benchmark for high-level reasoning. To fill this gap, we introduce the Acadreason benchmark, designed to evaluate the ability of LLMs and agents to acquire and reason over academic knowledge. It consists of 50 expert-annotated academic problems across five high-reasoning domains, including computer science, economics, law, mathematics, and philosophy. All questions are sourced from top-tier publications in recent years and undergo rigorous annotation and quality control to ensure they are both challenging and answerable. We conduct systematic evaluations of over 10 mainstream LLMs and agents. The results show that most LLMs scored below 20 points, with even the cutting-edge GPT-5 achieving only 16 points. While agents achieved higher scores, none exceeded 40 points. This demonstrates the current capability gap between LLMs and agents in super-intelligent academic research tasks and highlights the challenges of Acadreason.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > Netherlands > South Holland > The Hague (0.04)
- Europe > Monaco (0.04)
- (2 more...)
If you can distinguish, you can express: Galois theory, Stone--Weierstrass, machine learning, and linguistics
Blum-Smith, Ben, Brugman, Claudia, Conners, Thomas, Villar, Soledad
This essay develops a parallel between the Fundamental Theorem of Galois Theory and the Stone--Weierstrass theorem: both can be viewed as assertions that tie the distinguishing power of a class of objects to their expressive power. We provide an elementary theorem connecting the relevant notions of "distinguishing power". We also discuss machine learning and data science contexts in which these theorems, and more generally the theme of links between distinguishing power and expressive power, appear. Finally, we discuss the same theme in the context of linguistics, where it appears as a foundational principle, and illustrate it with several examples.
- Asia > Indonesia > New Guinea > Western New Guinea > Papua (0.14)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- Oceania > Papua New Guinea (0.04)
- (9 more...)
What Do Indonesians Really Need from Language Technology? A Nationwide Survey
Kautsar, Muhammad Dehan Al, Susanto, Lucky, Wijaya, Derry, Koto, Fajri
There is an emerging effort to develop NLP for Indonesias 700+ local languages, but progress remains costly due to the need for direct engagement with native speakers. However, it is unclear what these language communities truly need from language technology. To address this, we conduct a nationwide survey to assess the actual needs of native speakers in Indonesia. Our findings indicate that addressing language barriers, particularly through machine translation and information retrieval, is the most critical priority. Although there is strong enthusiasm for advancements in language technology, concerns around privacy, bias, and the use of public data for AI training highlight the need for greater transparency and clear communication to support broader AI adoption.
- Asia > Indonesia > Sulawesi > South Sulawesi > Makassar (0.04)
- North America > United States > California (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (34 more...)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- Education > Educational Setting (0.68)
Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?
Aycock, Seth, Stap, David, Wu, Di, Monz, Christof, Sima'an, Khalil
Extremely low-resource (XLR) languages lack substantial corpora for training NLP models, motivating the use of all available resources such as dictionaries and grammar books. Machine Translation from One Book (Tanzer et al., 2024) suggests prompting long-context LLMs with one grammar book enables English-Kalamang translation, an unseen XLR language - a noteworthy case of linguistic knowledge helping an NLP task. We investigate whether the book's grammatical explanations or its parallel examples are most effective for learning XLR translation, finding almost all improvement stems from the parallel examples. Further, we find similar results for Nepali, a seen low-resource language, and achieve performance comparable to an LLM with a grammar book by simply fine-tuning an encoder-decoder translation model. We then investigate where grammar books help by testing two linguistic tasks, grammaticality judgment and gloss prediction, and we explore what kind of grammatical knowledge helps by introducing a typological feature prompt that achieves leading results on these more relevant tasks. We thus emphasise the importance of task-appropriate data for XLR languages: parallel examples for translation, and grammatical data for linguistic tasks. As we find no evidence that long-context LLMs can make effective use of grammatical explanations for XLR translation, we suggest data collection for multilingual XLR tasks such as translation is best focused on parallel data over linguistic description.
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- (22 more...)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
You are what you eat? Feeding foundation models a regionally diverse food dataset of World Wide Dishes
Magomere, Jabez, Ishida, Shu, Afonja, Tejumade, Salama, Aya, Kochin, Daniel, Yuehgoh, Foutse, Hamzaoui, Imane, Sefala, Raesetje, Alaagib, Aisha, Semenova, Elizaveta, Crais, Lauren, Hall, Siobhan Mackenzie
Foundation models are increasingly ubiquitous in our daily lives, used in everyday tasks such as text-image searches, interactions with chatbots, and content generation. As use increases, so does concern over the disparities in performance and fairness of these models for different people in different parts of the world. To assess these growing regional disparities, we present World Wide Dishes, a mixed text and image dataset consisting of 765 dishes, with dish names collected in 131 local languages. World Wide Dishes has been collected purely through human contribution and decentralised means, by creating a website widely distributed through social networks. Using the dataset, we demonstrate a novel means of operationalising capability and representational biases in foundation models such as language models and text-to-image generative models. We enrich these studies with a pilot community review to understand, from a first-person perspective, how these models generate images for people in five African countries and the United States. We find that these models generally do not produce quality text and image outputs of dishes specific to different regions. This is true even for the US, which is typically considered to be more well-resourced in training data - though the generation of US dishes does outperform that of the investigated African countries. The models demonstrate a propensity to produce outputs that are inaccurate as well as culturally misrepresentative, flattening, and insensitive. These failures in capability and representational bias have the potential to further reinforce stereotypes and disproportionately contribute to erasure based on region. The dataset and code are available at https://github.com/oxai/world-wide-dishes/.
- North America > United States (0.88)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Africa > Democratic Republic of the Congo (0.14)
- (98 more...)
- Information Technology > Security & Privacy (1.00)
- Law (0.92)
- Government (0.92)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.52)
Synergetic Event Understanding: A Collaborative Approach to Cross-Document Event Coreference Resolution with Large Language Models
Min, Qingkai, Guo, Qipeng, Hu, Xiangkun, Huang, Songfang, Zhang, Zheng, Zhang, Yue
Cross-document event coreference resolution (CDECR) involves clustering event mentions across multiple documents that refer to the same real-world events. Existing approaches utilize fine-tuning of small language models (SLMs) like BERT to address the compatibility among the contexts of event mentions. However, due to the complexity and diversity of contexts, these models are prone to learning simple co-occurrences. Recently, large language models (LLMs) like ChatGPT have demonstrated impressive contextual understanding, yet they encounter challenges in adapting to specific information extraction (IE) tasks. In this paper, we propose a collaborative approach for CDECR, leveraging the capabilities of both a universally capable LLM and a task-specific SLM. The collaborative strategy begins with the LLM accurately and comprehensively summarizing events through prompting. Then, the SLM refines its learning of event representations based on these insights during fine-tuning. Experimental results demonstrate that our approach surpasses the performance of both the large and small language models individually, forming a complementary advantage. Across various datasets, our approach achieves state-of-the-art performance, underscoring its effectiveness in diverse scenarios.
- Asia > Singapore (0.05)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Indonesia > New Guinea > Western New Guinea > Papua (0.04)
- (18 more...)
A Rationale-centric Counterfactual Data Augmentation Method for Cross-Document Event Coreference Resolution
Ding, Bowen, Min, Qingkai, Ma, Shengkun, Li, Yingjie, Yang, Linyi, Zhang, Yue
Based on Pre-trained Language Models (PLMs), event coreference resolution (ECR) systems have demonstrated outstanding performance in clustering coreferential events across documents. However, the state-of-the-art system exhibits an excessive reliance on the'triggers lexical matching' spurious pattern in the input mention pair text. We formalize the decision-making process of the baseline ECR system using a Structural Causal Model (SCM), aiming to identify spurious and causal associations (i.e., rationales) within the ECR task. Leveraging the debiasing capability of counterfactual data augmentation, we develop a rationale-centric counterfactual data augmentation method with LLM-in-the-loop. This method is specialized for pairwise input in the Figure 1: The distribution of'triggers lexical matching' ECR system, where we conduct direct interventions in mention pairs from ECB+ training set, along with a on triggers and context to mitigate the false negative example from Held et al.'s system which spurious association while emphasizing the causation.
- North America > United States > Missouri > Jackson County > Kansas City (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Indiana > Marion County > Indianapolis (0.04)
- (28 more...)
- Research Report (1.00)
- Personal > Obituary (1.00)
- Leisure & Entertainment > Sports > Football (1.00)
- Information Technology > Security & Privacy (1.00)
- Leisure & Entertainment > Sports > Soccer (0.92)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.54)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
AI-Generated Faces in the Real World: A Large-Scale Case Study of Twitter Profile Images
Ricker, Jonas, Assenmacher, Dennis, Holz, Thorsten, Fischer, Asja, Quiring, Erwin
Recent advances in the field of generative artificial intelligence (AI) have blurred the lines between authentic and machine-generated content, making it almost impossible for humans to distinguish between such media. One notable consequence is the use of AI-generated images for fake profiles on social media. While several types of disinformation campaigns and similar incidents have been reported in the past, a systematic analysis has been lacking. In this work, we conduct the first large-scale investigation of the prevalence of AI-generated profile pictures on Twitter. We tackle the challenges of a real-world measurement study by carefully integrating various data sources and designing a multi-stage detection pipeline. Our analysis of nearly 15 million Twitter profile pictures shows that 0.052% were artificially generated, confirming their notable presence on the platform. We comprehensively examine the characteristics of these accounts and their tweet content, and uncover patterns of coordinated inauthentic behavior. The results also reveal several motives, including spamming and political amplification campaigns. Our research reaffirms the need for effective detection and mitigation strategies to cope with the potential negative effects of generative AI in the future.
- Asia > Russia (0.14)
- Europe > Ukraine (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (12 more...)
- Media (1.00)
- Information Technology > Services (1.00)
- Information Technology > Security & Privacy (1.00)
- Government > Regional Government > North America Government > United States Government (0.93)
An Incomplete Loop: Deductive, Inductive, and Abductive Learning in Large Language Models
Liu, Emmy, Neubig, Graham, Andreas, Jacob
Modern language models (LMs) can learn to perform new tasks in different ways: in instruction following, the target task is described explicitly in natural language; in few-shot prompting, the task is specified implicitly with a small number of examples; in instruction inference, LMs are presented with in-context examples and are then prompted to generate a natural language task description before making predictions. Each of these procedures may be thought of as invoking a different form of reasoning: instruction following involves deductive reasoning, few-shot prompting involves inductive reasoning, and instruction inference involves abductive reasoning. How do these different capabilities relate? Across four LMs (from the gpt and llama families) and two learning problems (involving arithmetic functions and machine translation) we find a strong dissociation between the different types of reasoning: LMs can sometimes learn effectively from few-shot prompts even when they are unable to explain their own prediction rules; conversely, they sometimes infer useful task descriptions while completely failing to learn from human-generated descriptions of the same task. Our results highlight the non-systematic nature of reasoning even in some of today's largest LMs, and underscore the fact that very different learning mechanisms may be invoked by seemingly similar prompting procedures.
- Asia > Indonesia > New Guinea > Western New Guinea > West Papua (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > Japan (0.04)
- (8 more...)
Artist Interview: Ian Kuali'i
Fleur had the pleasure of speaking with Ian Kuali'i, a multi-disciplinary self-taught artist of Hawaiian/Apache ancestry working in the forms of murals, large-scale hand cut paper and site-specific installations. From a single sheet of paper using only an xacto blade as his tool, Ian's portraits, journal entries and scenes are masterfully rendered in hand cut paper with a blend of loose urban contemporary techniques and collaged found materials. Ian describes his creative process as "The meditative process of destroying to create." Fleur: Can you please introduce yourself? My name is Ian Joseph Kekoa Hardwick-Kuali'i or just simply Ian Kuali'i.
- North America > United States > New Mexico > Santa Fe County > Santa Fe (0.06)
- Oceania (0.05)
- North America > United States > Hawaii > Maui County > Wailuku (0.05)
- (4 more...)